This is an example pipeline to run flow of two tumor/normal pairs through the hg19 pipeline for JaBbA. There are a few steps to setting up and running these jobs. After getting through the pipeline we we run a job post JaBbA creation and work towards visualizing some of the plots.
The pipeline that we will run through is covered here:
Here we align fastq files into bams for tumor/normal paired samples for WGS. Next, we run a series of jobs that are required inputs for the JaBbA pipeline. However, since the alignment of WGS samples can take upwards of a week, we will start off with a couple aligned hg19 BAM files and begin to work from there.
This is our pairs table in the purest form, where we have bam files for the associated tumor and normal sample. We will run the samples through the pipeline:
pairs = readRDS("db/pairs.rds")
pairs
## pair
## 40 25008780
## 49 25023660
## normal_bam
## 40 /gpfs/commons/groups/imielinski_lab/projects/DavisBRCA/Flow/ChaoJul2022/BWAMem_Fast/Sample_25008780-151/Sample_25008780-151.bam
## 49 /gpfs/commons/groups/imielinski_lab/projects/DavisBRCA/Flow/ChaoJul2022/BWAMem_Fast/Sample_25023660-549/Sample_25023660-549.bam
## tumor_bam
## 40 /gpfs/commons/groups/imielinski_lab/projects/DavisBRCA/Flow/ChaoJul2022/BWAMem_Fast/Sample_25008780-DNAt/Sample_25008780-DNAt.bam
## 49 /gpfs/commons/groups/imielinski_lab/projects/DavisBRCA/Flow/ChaoJul2022/BWAMem_Fast/Sample_25023660DNAt/Sample_25023660DNA.bam
There is intentionally something wrong with one of these samples: try and figure this out!
Once we have our generated bams we are going to run a series of jobs on them to generate the prerequisites for running JaBbA. There are a series of jobs that will need to done on the bams and they can all be ran in tandem. So let’s try and get one of these going.
The first one is the hetpileup caller, which generates a pile up of our allele fractions for a sample.
cat ~/tasks/hg19/core/HetPileups.task
## # HetPileupsWGS urn:lsid:broadinstitute.org:cancer.genome.analysis:10898:6
## ~/modules/Pileup
## input TumorBam tumor_bam_wgs path
## input KeepOnlyHets "TRUE" value
## input NormalBam normal_bam_wgs path
## input NumCores cores value "4"
## input MaxDepth max_depth value "1000" ## default is 500 (from bamUtils::mafcount > bamUtils::varcount
## input SitesVCF hapmap.sites path "/gpfs/commons/groups/imielinski_lab/DB/references/hg19/hapmap_3.3.b37.vcf.gz"
## output het_pileups_wgs sites.txt$
Here we see a few arguments that will need to be inputed, and there are a couple columns in our pairs table that may need to be renamed. The task text table format is something like this:
| input or output | value name inside the Job class | input label/colname of the file | class expected of input |
|---|
For more information on how to set up jobs, the wiki on Flow is a good source to follow.
These jobs/tasks that need to be run on the bams and can all be done in tandem. Expect if all things go well with all the jobs that it should be done within a few hours.
Here’s a table of the BAM jobs to run:
| tool | task location | description |
|---|---|---|
| Svaba | ~/tasks/hg19/core/Svaba.task |
SV caller |
| Strelka | ~/tasks/hg19/core/Strelka2 |
SNV caller |
| fragcounter (on tumor) | ~/tasks/hg19/core/fragCounter.task |
fragment counting across the genome with GC and mappability corrections |
| fragcounter (on normal) | ~/tasks/hg19/core/fragCounter.normal.task |
same as above but for normal |
| HetPileup | ~/tasks/hg19/core/HetPileups.task |
Get allele fractions for het sites across genome |
After all of these steps are ran and processed, remember to merge their outputs together and save this into our pairs table.
Alternatively, one can also use GRIDSS/GRIPSS as a SV and junction breakpoint detector.
If all things are working well, we can begin processing some of the bam outputs in preparation for JaBbA. One of these is dryclean, which further processes the fragcounter outputs to further remove noise and signal via rPCA using a PON.
Dryclean our fragcounter outputs for both tumor and normal.
The tasks associated with dry clean:
dryclean normal: ~/tasks/hg19/core_dryclean/Dryclean.normal.task
dryclean tumor: ~/tasks/hg19/core_dryclean/Dryclean.task
Then we use CBS to get our segments ready for JaBbA.
After this step is done, run CovCBS: ~/tasks/hg19/core/CBS.task
MERGE ALL OF THIS INTO YOUR PAIRS TABLE BEFORE PROCEEDING.
Ascat purity/ploidy estimation and other estimations can be done separately here. This job is not necessary as JaBbA will do its own estimation, but it could help with a few fringe cases.
The task is: ~/tasks/ascat_seg.task
Now that we have all the right inputs, it’s time to run JaBbA.
The task is located here: ~/tasks/JaBbA_ZC_dev.task
and looking into the task:
Flow::Task("~/tasks/JaBbA_ZC_dev.task")
## No methods found in package 'data.table' for request: 'key' when loading 'Flow'
## #Module JaBbA_ZC_dev [Task: JaBbA_ZC_dev path: ~/tasks/JaBbA_ZC_dev.task] ("sh <libdir>/run.sh <RAfile> <CovFile> --verbose --name <TumorName> --seg <SegFile> --field <CovField...")
## ~/modules/JaBbA_ZC_dev/
## input CovFile cov_rds path
## input RAfile junctionFilePath path
## input CovField field value ratio
## input RAsupp junctionUnfiltered path /dev/null
## input TierFieldName tfield value tier
## input NormalSegFile cbs_nseg_rds path /dev/null
## input SegFile cbs_seg_rds path /dev/null
## input SlackPenalty slack value 100
## input OptionalHetPileupOutput het_pileups_wgs path /dev/null
## input Purity purity value NA
## input Ploidy ploidy value NA
## input tilim tilim value 6000
## input epgap epgap value 1e-8
## input ppmethod pp.method value ppgrid
## input maxna maxna value 0.8
## input flags flags value
## input blacklist.coverage blacklist.coverage path /dev/null
## input blacklist.junctions blacklist.junctions path /dev/null
## input NumIterations iter value 0
## input TumorName pair value tumor
## input job.spec.memory "15" value
## input indel indel value exclude
## input cnsignif cnsignif value 0.00001
## input lp lp value TRUE
## input ism ism value TRUE
## input mem treemem value 16
## input fix.thres fix.thres value -1
## input gurobi gurobi value FALSE
## input nonintegral nonintegral value FALSE
## output jabba_rds jabba.simple.rds$
## output jabba_gg jabba.simple.gg.rds$
## output jabba_vcf jabba.simple.vcf$
## output jabba_raw_rds jabba.raw.rds$
## output opti opt.report.rds$
## output jabba_seg jabba.seg$
## output karyograph karyograph.rds$
We have bunch of values that will need to be filled. Here are some of the inputs that will need to be altered to:
cov_rds should be tumor_dryclean_covjunctionFilePath should be svaba_somatic_vcfCovField should be foreground if you are following the blog filecbs_seg_rds should match cbs outputs from the merge.cbs_nseg_rds should match cbs outrputs from the merge.het_pileups_wgs should be the het_pileup sites.txt output for that file.Purity and ploidy values can be precalculated via the ascat job and inputted here; otherwise the task and JaBbA will estimate these for you.
There are a few other estimated parameters that can be played around with for the fits, and in your free time, I would recommend playing around with these to see how the solver is affected by the various arguments.
From here, run the program, and we should be good to go. Expect each pair sample to take at most a few hours. There will occasionally be cases where the solver in JaBbA fails to converge, and it will become important to play around with parameters to solve for cases like these.
Most post-JaBbA tasks are either related to the sample as a whole or the ggraph structure generated by the JaBbA job. Here we will go into a simple post JaBbA task which is “event calling” via gGnome. We run the task ~/tasks/Events.task which calls the event caller from gGnome. From here the caller uses set thresholds and criteria to identify the type of classes and events are found within each sample.
Most of these jobs like Fusions, Alterations, and Events are quick, so we can expect a reasonably fast turnaround.
Finally, plotting is done via gGnome. Let’s try and plot some of our JaBbA outputs.
Use the gGnome tutorial for reference: Link
Using the Complex JaBbA output from the Events job:
Plot a basic SV event via gGraph.
Tweak the parameters around the range of the event to increase by 10K bp on each side
Generate a gWalk of one of these SV events.
101 on Flow Usage: http://mskiweb/101/flow.html
Links to Tools within the pipeline: 1. Svaba 2. Strelka2 3. fragcounter 4. Hetpileups is an internal tool, script is found here: ~/modules/Pileup/Pileup.R 5. dryclean 6. CBS 7. ascat 8. JaBbA 9. gGnome